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Abstract. Biclustering numerical data became a popular data-mining 
task in the beginning of 2000's, especially for analysing gene expression 
data. A bicluster reflects a strong association between a subset of objects 
and a subset of attributes in a numerical object /attribute data-table. So 
called biclusters of similar values can be thought as maximal sub-tables 
with close values. Only few methods address a complete, correct and 
non redundant enumeration of such patterns, which is a well-known in- 
tractable problem, while no formal framework exists. In this paper, we 
introduce important links between biclustering and formal concept anal- 
ysis. More specifically, we originally show that Triadic Concept Analysis 
(TCA), provides a nice mathematical framework for biclustering. Inter- 
estingly, existing algorithms of TCA, that usually apply on binary data, 
can be used (directly or with slight modifications) after a preprocessing 
step for extracting maximal biclusters of similar values. 
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1 Introduction 

Numerical data biclustering mainly appeared in the beginning of 2000's as a 
first answer to new challenges raised by biological data analysis, and especially 
gene expression data analysis [13] . Starting from an object /attribute numerical 
data-table (e.g. Table [TJ, the goal is to group together some objects with some 
attributes according to the values taken by these attributes for these objects [13] . 
Accordingly, a bicluster is formally defined as a pair composed of a set of ob- 
jects and a set of attributes. Such pair can be represented as a rectangle in the 
numerical table, modulo lines and columns permutations. Table [T] is a numerical 
dataset with objects in lines and attributes in columns, while each table entry 
corresponds to the value taken by the attribute in column for the object in line. 
Table [5] illustrates bicluster ({31, 52, 33}, {mi, Tn>2, 1TI3}) as a grey rectangle. 

There are several types of biclusters in the literature (see [13] for a survey), 
depending on the relation between the values taken by their attributes for their 



objects. The most simple case can be understood as rectangles of equal val- 
ues: a bicluster corresponds to a set of objects whose values taken by a same 
set of attributes are exactly the same, e.g. ({<?i, 53}: { TO s})- Constant bi- 
clusters only appear in idyllic situations: generally numerical data are noisy. 
Accordingly, a straightforward generalization of such biclusters lies in so called 
biclusters of similar values: they are represented by rectangles with almost iden- 
tical, say similar, values |13)l|7j . Table [2] illustrates a bicluster of similar values 
({ffii 92-, <?3}j m 2i W3}) where two values are said to be similar if their dif- 
ference is no more than 1. Moreover, this bicluster is maximal: neither an object 
nor an attribute can be added without violating the similarity condition. 

Only few methods address a complete, correct and non redundant enumer- 
ation of such patterns |l|7j . which is a well-known intractable problem |13j . 
while no formal framework exists. In this paper, we show that Formal Concept 
Analysis (FCA) [3J, and especially Triadic Concept Analysis (TCA) [TJ] pro- 
vides a suitable and well defined framework for this task: Basically, an object 
has an attribute under a condition (a value). After a simple scaling procedure 
(turning original data into binary), a bicluster is represented as a triadic con- 
cept, composed of a set of objects, a set of attributes (both characterizing the 
corresponding "rectangle") and a set of values. All sets are maximal thanks to 
existing concept forming derivation operators of TCA. This comes with several 
advantages: 

— Two values w% , W2 of the original data are said to be similar iff their difference 
does not exceed a given parameter 9. In this case, we write w\ ~# u>2 

\wi — w 2 \ < 0. Otherwise, we write w\ 9^0 w 2 - The trilattice produced with 
TCA after scaling gives all maximal biclusters of similar values for any 
ordered w.r.t. similarity of their values. 

— The well known notion of frequency takes a semantics w.r.t. similarity of 
values. For example, let (A, B, C) be a triconcept, where A is a set of objects, 
B a set of attributes, and C a set of similar values. Assume (A, B) to be the 
corresponding bicluster. The higher \C\, the more similar are the values of 
the bicluster. If all \A\, \B\, and |C| are high we obtain a bicluster represented 
as a large rectangle of close values. 

— Existing algorithms from TCA [3] and n-ary closed set mining [2] can be used 
directly after scaling. We also provide a new algorithm to compute biclusters 
maximal only for a given 9 (see algorithm TriMax later on). 

— Both scaling procedure and algorithm TriMax computations can be directly 
distributed to several computing cores. 

— The method can be adapted to n-ary numerical datasets. For example, with 
n — 3, a n-cluster would be a maximal 3-D-box of similar values. It can be 
applied to 3D gene expression data, monitoring the behaviour of genes in 
different samples over time. It follows that mining n-dimensional clusters 
can be achieved with n + 1-adic concept analysis. 

The paper is organized as follows. Firstly, preliminaries regarding TCA are 
presented in Section[2] Then Section [3] formally states the problem. It is followed 
by the description of our two methods, respectively in Section [4] and [5] The 



first shows how TCA can help characterizing all maximal biclusters for any 
6, while the second restricts the problem to a user-given 8. This is followed 
by experiments on the proposed approaches. Finally, the paper ends with a 
discussion and perspectives of further research. 



Table 1: A numerical dataset 

mi 7712 7773 777,4 7715 

91 II 1 2 2 1 6~~ 

g 2 2 1 1 6 

g 3 2 2 1 7 6 

g 4 8 9 2 6 7 



Table 2: A bicluster of similar values 



mi 7712 7713 7714 7715 



12 2 16 
2 110 6 
2 2 17 6 

8 9 2 6 7 



2 Triadic Concept Analysis 

We assume that the reader is familiar with basic notions of Formal Concept Anal- 
ysis [3]. Lehmann and Wille introduced Triadic Concept Analysis (TCA [12]). 
Data are represented by a triadic context, given by (G, M, B,Y). G, M, and 
B are respectively called sets of objects, attributes and conditions, and Y C 
G x M x B. The fact (g, m,b) G Y is interpreted as the statement "Object g has 
the attribute m under condition 6" . 

A (triadic) concept of (G,M,B,Y) is a triple (Ai,A 2 ,A 3 ) with A\ C G, 
A2 C M and A3 C B satisfying the two following statements: (i) A\ x A2 x A3 C 
7, Ii x I 2 x I 3 C Y and (ii) Ai C Xi, A 2 C X 2 and A 3 C X 3 implies 
Aj = X\, A 2 = X 2 and A 3 = X3. If (G, M, B, Y) is represented by a three 
dimensional table, (i) means that a concept stands for a 3-dimensional rectangle 
full of crosses while (ii) characterises component-wise maximality of concepts. 
For a triadic concept (Ai, A 2 , A3), A\ is called the extent, A 2 the intent and A3 
the modus. 

To describe the derivation operators, it is convenient to alternatively repre- 
sent a triadic context as (Ai, A 2 , A 3 , Y). Then, for {i,j,k} = {1,2,3}, j < k, 
X C Ki and Z C Kj x A&, (i)-derivation operators are defined by: 

) € Kj X iTfc I (ai, a,, Ofc) € y for all 6 X} 
<p' : Z ->• : {a, G A, | (a,i,aj, ak) G Y for all (aj,a k ) G Z} 
This definition leads to derivation operator K*- 3 ) and dyadic context K' 3 ) = 
(Kq,K-i x K 2 ,Y^). Further derivation operators are defined as follows: for 
{i,j,k} — {1,2,3}, Xi C Ki, Xj C Kj and A& C A" fe , the Afe)-derivation 
operators are defined by: 

# : X -> xp- A *> : {a rf G A, | (o,, a,, a fc ) G Y for all (a,, a k ) G A, x A k } 
■ X 3 -► xj lJ ' Afc) : {a, G A, | (ai,a^a fe ) G Y for all (a,,a fe ) G Xj x A k } 

Operators # and # will be called outer operators, pair of both operators outer 
closure and dyadic operators \P and \P inner operators or inner closure when pair 
of both is used. Derivation operators of dyadic context are defined by K^ fc = 
(Ai, Kj, Y%* ), where (ai, dj) G Y% iff a,, a,j, are related by Y - for all a* G A k . 

From a computational point of view, [3] developed the algorithm Trias for 
extracting frequent triadic concepts, i.e. whose extent, intent and modus cardi- 
nalities are higher than user-defined thresholds (see also [5]). Cerf et al. presented 



a more efficient algorithm called Data-peeler able to handle n-ary relations [2] 
while formal definitions lie in so called Polyadic Concept Analysis [13] . 

3 Notations and problem settings 

A numerical dataset is realized by a many-valued context [3] and we define 
accordingly (maximal) biclusters of similar values. 

Definition 1 (Many- valued context). Let G be a set of objects, M be a set 
of attributes, W be the set of attribute values and I be a ternary relation defined 
on the Cartesian product G x M x W. The fact (g,m,w) € /, also written 
m(g) — w, means that "Attribute m takes the value w for the object g " . The 
tuple (G, M, W, I) is called many-valued context, or simply numerical dataset in 
this paper. 

Example 1. Table[TJis a numerical dataset, or many-valued context, with objects 
G = {gi, 52, 53, gi}, attributes M = {mi, m 2 , m 3 , m 4 , m 5 }, W = {0, f , 2, 6, 7, 8, 9} 
and for example 7715(32) = 6. 

Definition 2 (Bicluster). In a numerical dataset (G, M,W, I) , a bicluster is 
a tuple (A, B) with ACG and B CM. 

Definition 3 (Similarity relation and bicluster of similar values). Let 

w\,W2 € W be two attribute values and 9 E N be a user-defined parameter, 
called similarity parameter. W\ and W2 are said to be similar iff \ui\ — IU2I < 
and we note w\ ~g u>2- (A,B) is bicluster of similar values if m(g) ~# n(h) for 
all g,h £ A and for all m,n G B . 

Definition 4 (Maximal bicluster of similar values). A bicluster of similar 
values (A, B) is maximal if adding either an object in A or an attribute in B 
does not result in a bicluster of similar values. 

Example 2 (From Table^. ({<?i, .94}, {1)12, TO4}) is a bicluster. ({gi, 172}, {^2}) 
is a bicluster of similar values with 6 > 1. However, it is not maximal. With 
f < 9 < 5, ({<7i, <72, 53}, 7713}) is maximal. Finally, with 9 = 7 the biclus- 

ter ({gi, 32, 53}, {wi, TU2, m,3, 1114, m^}) is maximal. Note that a constant (max- 
imal) bicluster is a (maximal) bicluster of similar values with 9 = 0. 

Thus the problem that we address in this paper is the extraction of all max- 
imal biclusters of similar values from a numerical dataset. We desire the extrac- 
tion to be complete, correct and non-redundant compared to several existing 
methods of the literature based on heuristics [13]. For that matter, we pro- 
pose in the next section a first method aiming at extracting biclusters for any 
similarity parameter 9. This method establishes new links between biclustering 
and FCA in general, and TCA in particular. Then, the present methodology is 
adapted to characterize and extract biclusters that are maximal for a given 9 
only as usually done in the literature |1|7|13| . 



4 Biclusters of similar values in Triadic Concept Analysis 



Firstly, we consider the problem of generating maximal biclusters for any 9. 
Starting from a numerical dataset (G, M, W, I), the basic idea lies in building a 
triadic context (G, M, T, Y) where the two first dimensions remain formal objects 
and formal attributes, while W is scaled into a third dimension denoted by T. 
This new dimension T is called the scale dimension: intuitively, it gives different 
"spaces of values" that each object-attribute pair (g, to) G G x M can take. Once 
the scale is given, a triadic context is derived from which triadic concepts are 
characterized. 

We use the interordinal scaling [3] to build the scale dimension. It allows to 
encode in 2 T all possible intervals of values in W. This scale allows to derive a 
triadic context from which any biclustcr of similar values can be characterized 
as a triadic concept. We made more precise these statements and illustrate the 
whole procedure with examples. 

Definition 5 (Interordinal Scaling). A scale is a binary relation J C W x T 
associating original elements from the set of values W to their derived ele- 
ments in T. In the case of interordinal scaling, T — { [min(W) , w] , Vto £ W} U 
{[w,max(W)],Vw € W}. Then (w,t) eJiffwet. 

Example 3. Table [3] gives the tabular representation of the interordinal scale for 
Table [T] Intuitively, each line describes a single value, while dyadic concepts 
represent all possible intervals over W . An example of dyadic concept in this 
table is given by ({6, 7, 8}, {t 6 , t?, t&, tg, ho}), rewritten as ({6, 7, 8}, {[6, 8]}) since 
{t 6 , t 7 , t s ,t 9 ,t w } represents the interval [0, 8] n [0, 9] n [1, 9] H [2, 9] n [6, 9] = [6, 8]. 
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Table 3: Interordinal scale of the set of attribute values W . 

Once the scale is defined, we can derive the triadic context w.r.t. this scale. 

Definition 6 (Triadic scaled context). Let Y be ternary relation Y C G x 
MxT. Then (g,m,t) £ Y iff (m(g),t) € J, or simply m(g) € t. We call the tuple 
(G, M,T,Y) the triadic scaled context of the numerical dataset (G, M,W, I). 

Example 4- The object-attribute pair [g\, mi) taking value mi (gi) = 1 is scaled 
into triples (gi,m 1: t) € Y where t takes any interval in {[0, 1], [0, 2], [0, 6], [0, 7], 



[0,8], [0,9], [1,9]}. The intersection of intervals in this set is the original value 
itself, i.e. m 1 (g 1 ) = 1, a basic property of interordinal scaling. As a result, Table]!] 
illustrates the whole scaled triadic context derived from the numerical dataset 
given in Table [I] using interordinal scale. The very first cross (x) in this table 
(upper left) represents the tuple (32, fa), meaning that 7714(32) € [0,0]. 

We present now our first main result: there is a one-to-one correspondence 
between (i) the set of maximal biclusters of similar values in a given numerical 
dataset for any similarity parameter 6 and (ii) the set of all triadic concepts in 
the triadic context derived with interordinal scaling. 

Proposition 1. Tuple (A,B,U), where A C G, B C G and U C T is triadic 
concept iff (A, B) is a maximal bicluster of similar values for some 6 > 0. 

Proof. We leave the proof in the Appendix of the paper since we need to intro- 
duce notations and propositions not necessary in the rest of the paper. 

Example 5. For example, ({31, 32, 33}, {mi, 7712, TO3}, {t^, t±, ig, ig, tr, is}) is a tri- 
adic concept from the context depicted in Table|4] It corresponds to the maximal 
bicluster ({31, 32, 33}, {m u m 2 , m 3 }) with 6 = 1. 6 = 1 since {i 3 , £,4, i 5 , i 6 , i 7 , is} 
is maximal (it is a modus), it corresponds to interval [1, 2] and naturally 2 — 1 = 1 
is the length of this interval. 
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Table 4: Triadic scaled context for Table [l] with interordinal scaling. 

Hence we showed that extracting biclusters of similar values for any 9 in a 
numerical dataset can be achieved by (i) scaling the attribute value dimension 
and (ii) extracting the triadic concepts in the resulting derived triadic context. 

Interestingly, triadic concepts (A, B, U) with the largest sets A, B or C rep- 
resent large biclusters of close values. Indeed, the larger \A\ and \B\ the larger 
the data covering of the corresponding bicluster. Furthermore, the larger \U\, the 
more similar values for bicluster (A, B). Indeed, by the properties of interordinal 



scaling, the more intervals in U, the smaller their interval intersection. Mining 
so called top-fc frequent triadic concepts can accordingly be achieved with the 
existing algorithm Data-Peeler [2J. 

On another hand, extracting maximal biclusters for all 9 may be neither 
efficient nor effective with large numerical data: their number tends to be very 
large and all biclusters are not relevant for a given analysis. Furthermore, both 
size and density of contexts derived with interordinal scaling are known to be 
problematic w.r.t algorithmic scalability, see e.g. [5]. In existing methods of the 
literature, 9 is set a priori. We show now how to handle this case with slight 
modifications, our second main result. 

5 Extracting biclusters of similar values for a given 6 

In this section we consider the problem of extracting maximal biclusters of sim- 
ilar values in TCA for a given 9 only. It comes with slight modifications of the 
methodology presented in last section. Intuitively, consider the previous scaling 
applied on a numerical dataset (G,M,W,I). It scales W into dimension T and 
subsets of T characterize all intervals of values over W. To get maximal biclusters 
for a given 9 only, we should not consider all possible intervals in W, but rather 
all intervals (i) having a range size that is less or equal than 9 to avoid biclusters 
with non similar values, and (ii) having a range size the closest as possible to 
9 to avoid non-maximal biclusters. For example, if we set 6 = 2, it is probably 
not interesting to consider interval [0, 8] in the scale dimension since 8 — > 9. 
Similarly, considering the interval [6, 6] may not be interesting as well, since a 
bicluster with all its values equal to 6 may not be maximal. As introduced in [B] , 
those maximal intervals of similar values used for the scale are called blocks of 
tolerance over the set of numbers W with respect to the tolerance relation ~# • 

Therefore we firstly recall basics on tolerance relations over a set of numbers. 
It allows us to define a simpler scaling procedure. The resulting triadic context 
is then mined with a new TCA algorithm called TriMax to extract maximal 
biclusters of similar values for a given 9. 

Blocks of tolerance over W are defined as maximal sets of pairwise similar 
values from W: 

Definition 7 (Tolerance blocks from a set of numbers). The similarity 
relation ~# is called a tolerance relation, i.e. reflexive, symmetric but not tran- 
sitive. Given a set W of values, a subset V C W , and a tolerance relation ~g 
over W, V is a block of tolerance if: 

(i) Viui,t02 G V, u>i ~ e u>2 (pairwise similarity) 

(ii) Vwi $ V, 3ui2 € V, W\ W2 (maximality). 

From Table[l]we have W = {0, 1, 2, 6, 7, 8, 9}. With 9 = 2, one has ~ 2 2 but 
2 9^2 6. Accordingly, one obtains 3 blocks of tolerance, namely the sets {0, 1, 2}, 
{6, 7, 8} and {7, 8, 9}. These three sets can be renamed as the convex hull of their 
elements on N: respectively, [0, 2], [6, 8] and [7, 9]: any number lying between the 



minimal and the maximal elements (w.r.t. natural number ordering) of a block 
of tolerance is naturally similar to any other element of the block. 

To derive a triadic context from a numerical dataset, we simply use tolerance 
blocks over W to define the scale dimension. 

Definition 8 (Trimax scale relation). The scale relation is a binary relation 
J C W x C, where C is the set of blocks of tolerance over W renamed as their 
convex hulls. Then, (w,c) G J iff w G c. 

Example 6. From Table[I]we have: C = {[0, 1], [1, 2], [6, 7], [7, 8], [8, 9]} with 9 = 
1, and C = {[0,2], [6,8], [7,9]} with 9 = 2. 

Then, we can apply the same context derivation as in previous section: scaling 
is still based on intervals, but this time it uses tolerance blocks. 

Definition 9 (TriMax triadic scaled context). Let Y C G x M x C be a 

ternary relation. Then (g, m,c) G Y iff (m(g), c) G J , or simply m(g) G c, where 
J is the scale relation. (G, M, C, V) is called the TriMax triadic scaled context. 



Example 7. Table [5] is the Trimax triadic scaled concept derived from the nu- 
merical dataset lying in Table [l] with 9 = 1. 





label 1 


label 2 


label 3 


label 4 


label 5 


[0,1] 


[1,2] 


[6,7] 


[7,8] 


[8,9] 


mi 7712 TTI3 7714 m 5 


mi 


777,2 


777,3 7774 7775 


777-1 f?l2 m 3 7774 


m 5 


mi 777 2 ^3 1^4 771 5 


mi 777-2 7773 777-4 m5 


91 


X X 


X 


X 


X X 




X 






32 


XXX 


X 


X 


X 




X 






93 


X 


X 


X 


X 


X 


X 


X 




94 








X 


X 


X 


X X 


X X 



Table 5: Triadic scaled context using tolerance blocks over W and 9=1. 



Definition 10 (Dyadic context associated with a block of tolerance). 

Consider a block of tolerance c G C . The dyadic context associated with this block 
is given by (G, M, Z) where z G Z denotes all (g, m) G G X M such as m{g) G c. 

Example 8. In Table [5j each such dyadic context is labelled by its corresponding 
block of tolerance. 

Now, remark that blocks of tolerance over W are totally ordered: let [ui,«2] 
and [wi,W2\ be two blocks of tolerance, one has [uj.,U2] < [^1,^2] iff V\ < ui\. 
Hence, associated dyadic contexts are also totally ordered and we use a corre- 
sponding indexing set to label them. In Table [5j contexts for blocks ([0, 1], [1, 2], 
[6, 7], [7, 8], [8, 9]) are respectively labelled (1, 2, 3, 4, 5). 

We now present our second main results: The scaled triadic context supports 
the extraction of maximal biclusters of similar values for a given 9. In this case 
however, existing algorithms of TCA cannot be applied directly. For example, in 
Table [5] the triconcept ({53}, {m^}, {3,4}) corresponds to a bicluster of similar 
values which is not maximal. Hence we present hereafter a new TCA algorithm 
for this task, called TriMax. 



The basic idea of TriMax relies on the following facts. Firstly, since each 
dyadic context corresponds to a block of tolerance, we do not need to compute 
intersections of contexts, such as classically done in TCA. Hence each dyadic 
context is processed separately. Secondly, a dyadic concept of a dyadic context 
necessarily represents a bicluster of similar values, but we cannot be sure it is 
maximal (see previous example). Hence, we need to check if a concept is still 
a concept in other dyadic contexts, corresponding to other classes of tolerance. 
This is made precise with the following proposition. 

Proposition 2. Let (A, B, U) be a triadic concept from Trimax triadic scaled 
context (G,M,C,Y), such that U is the outer closure of a singleton {c} C C. 
If \U\ = 1, (A, B) is a maximal bicluster of similar values. Otherwise, (A, B) 
is a maximal bicluster of similar values iff ' $y £ [min(U);max(U)}, y < c s.t. 
(A, B) Wy^Py^A, B))), where and &y(.) correspond to inner derivation 

operators associated with y th dyadic context. 

Proof. When \U\ = 1, (A, B) is a dyadic concept only in one dyadic context 
corresponding to a block of tolerance. By properties of tolerance blocks, (A, B) 
is a maximal bicluster. If \U\ ^ 1, (A,B) is a dyadic concept in \U\ dyadic 
contexts. Since the tolerance block set is totally ordered, it directly implies that 
modus U is an interval [min(U); max(U)]. Hence, if By € [min(U);max(U)] s.t. 
(A, B) = \P y ()Py((A, B))) this means that (A, B) is not a maximal bicluster of 
similar values. 

Description of the TriMax algorithm. TriMax starts with scaling ini- 
tial numerical data into several dyadic contexts, each one standing for a block of 
tolerance over W with given 8. The set of all dyadic contexts forms accordingly 
a triadic context. Then, each dyadic context is mined with any FCA algorithm 
(or closed itemset mining algorithm), and all formal concepts are extracted. For 
a given concept (A, B), we compute outer derivation ((A,B)), i.e. to obtain 
the set of dyadic contexts labels in which the current dyadic concept holds. If it 
results in a singleton, this means that (A, B) is a concept for the current block 
of tolerance only, i.e. it is a maximal bicluster of similar values, and it has been, 
or will never be, generated twice. Otherwise, (A, B) is a concept in other con- 
texts, and can be generated accordingly several times (as much as the number of 
contexts in which it holds). Then, we only consider (A, B) if we are sure it is the 
last time it is computed. Finally, we need to check if current concept represents 
a maximal bicluster, i.e. there should not exist a context from the modus where 
(A, B) is not a dyadic concept. 

Proposition 3. TriMax outputs a (i) complete, (ii) correct and (Hi) non re- 
dundant collection of all maximal biclusters of similar values for a given numer- 
ical dataset and similarity parameter 9. 

Proof, (i) and (ii) follow directly from Proposition [2j Statement (iii) is ensured 
by the second if condition of the algorithm: a dyadic concept (or equivalently 
bicluster) is considered iff it has been extracted in the last dyadic context in 
which it holds. 



Algorithm 1: TriMax 



input : Numerical dataset (G, M,W, I), tolerance parameter 8 
output: Maximal biclusters of similar values 

Let C = {[<Ji, bi]} be the totally ordered set of all blocks over W for given 9. 
Indices i form an indexing set. 
forall the [ai, &*] 6 C do 
_ Build context (G,M,Zi) such that (g,m) £ Zi <s> m(g) G [a»,6i] 

forall the (G,M,Zi) do 

Use any FCA algorithm to extract all its concepts (A, B) 

forall the dyadic concepts (A,B) in the current context (G,M,Zi) do 

if \<p'((A,B))\ = 1 then 
|_ print (A, B) 

else if max(<P ((A, B)) = i then 
x <- min{$ {{A, B)) 

if $y G [jr, i[s.t. then 
|_ print (A, B) 



6 Computer experiments 

In this section, we experiment with the algorithm TriMax and highlight various 
aspects of its practical complexity. 

Data. We explore a gene expression dataset of the species Laccaria bicolor avail- 
able at NCB^j More details on this dataset can be found in [3] . This gene expres- 
sion dataset monitors the behaviour of 11,930 genes in 12 biological situations, 
reflecting various stages of Laccaria bicolor biological cycle. Attribute values in 
W vary between and 60, 000. 

TriMax implementation. TriMax is written in C++. It uses the BOOST 
library 1.42 for data structures and the implementation of InClose from its 
authors]^] for dyadic concepts extraction. At each iteration of the main loop, 
i.e. each tolerance block, the current scaled dyadic context is produced: We do 
not generated the whole triadic context which cannot fit into memory for large 
databases. It turns out that the modus computation for a given dyadic concept 
requires to compute scaling "on the fly", i.e. when computing the set of dyadic 
contexts in which a current concept holds. The experiments were carried out on 
an Intel CPU 2.54 Ghz machine with 8 GB RAM running under Ubuntu 11.04. 

Experiment settings. The goal of the present experiments is not to give a 
qualitative evaluation of the present approach (say biological interpretation), 
but rather a quantitative evaluation. Indeed, the present work aims at showing 

5 http://www.ncbi.nlm.nih.gov/geo/ as series GSE9784 

6 http : //sourcef orge .net/projects/ inclose/ 



(i) Numbers of patterns (Y-axis) (ii) Execution times in seconds (Y-axis) 

w.r.t. 8 (X-axis) and \G\ (Z-axis) w.r.t. 8 (X-axis) and \G\ (Z-axis) 




(iii) Numbers of blocks of tolerance (Y-axis) (iv) Density of triadic contexts (Y-axis) 
w.r.t. 8 (X-axis) and \G\ (Z-axis) w.r.t. 8 (X-axis) and \G\ (Z-axis) 




(v) Comparing the number of generated dyadic 
concepts w.r.t. the actual number of maximal 
biclusters varying 8 with \G\ = 500 




(vi) Repartition of execution time 
w.r.t main steps of TriMax 
with 8 = 33, 000 and \G\ = 500 



Fig. 1: Monitoring with different settings (i) the number of maximal biclusters, 
(ii) the execution times of TriMax, (iii) the number of tolerance blocks, (iv) 
the derived triadic context density, (v) the number of non-maximal biclusters 
generated as dyadic-concepts w.r.t. the number of maximal biclusters, and (vi) 
repartition of execution time in the TriMax algorithm. 



how an existing type of biclusters can be mined with Triadic Concept Analysis. 
For a qualitative evaluation, the reader may refer for example to |1I9) . 

Accordingly, we designed the following experiments to monitor various as- 
pects of the TriMax algorithm. For most of the experiments, the dataset used is 
composed of an increasing number of objects and all attributes. The objects are 
chosen randomly once and for all so that the different experiment results can be 
compared. We also vary the parameter 9 in the same way across all experiments. 
Then, we monitor the following aspects, as presented in Figure [T] 

i. Number of maximal biclusters of similar values 

ii. Execution time (in seconds) 

iii. Number of tolerance blocks 

iv. Density of the triadic context, where density is defined as d(G, M, C, Y) = 
\Y\/(\G\ x \M\ x |C|). This information is important, since contexts with 
high density are known to be hard to process with FCA algorithms [TT| , and 
we use the InClose algorithm for dyadic contexts processing. 

v. Comparison between the number of non-maximal biclusters produced by 
TriMax (i.e. dyadic concepts that do not corresponds to maximal biclus- 
ters) with the number of maximal biclusters. 

vi. Execution time profiling of the main procedures of TriMax. This is achieved 
with the tool GNU GProf and gives us what parts of the algorithm are 
the most time consuming. 

Experiment results. Figure [T] presents the results of our experiments with 
different settings. In these settings, we vary the number of objects \G\ and the 
parameter 9. A first observation arises from graph (i): the number of biclusters 
is the highest when 9 ~ 30, 000. A first explanation is that 30, 000 is the half of 
the maximal value of W and almost all multiples of 100 in [0; 60, 000] belongs 
to W . In graph (ii), execution time has the same behaviour as graph (i). These 
results can be understood by paying attention to the next graphs (iii) and (iv). 
In (iii) is monitored the number of tolerance blocks. The maximal number is 
reached when 9 = 0, i.e. \C\ = \W\. When 9 = max(W), we have |C| = 1. Now 
we observe in (iv) that the density follows a reverse behaviour: When 9 = 0, the 
density tends towards 0%; when 9 = max(W), then density exactly equal 1%. 
Combining both graph (iii) and (iv), the worst cases happen when both density 
and tolerance bloc count are high. 

Another observation, which explains also the execution times, arises from 
graph (v) . Here are compared the number of maximal biclusters and the number 
of non-maximal biclusters generated as dyadic concepts. Here again, worst case 
is reached when 9 ~ 30, 000. Looking at graph (vi), we learn that this is however 
not the major problem. The mostly consuming procedure of TriMax is the 
computation of the modus of a dyadic concept. The explanation is that we 
compute modus with "on the fly scaling" . 

Therefore, the bottleneck of our algorithm reveals itself to be the modus 
computation. In practical applications however, the analyst is not interested in 
all biclusters of similar values. Some constraints are generally defined, such as 
a minimal (resp. maximal) number of objects (resp. attributes) in a bicluster 



(A,B), or a minimal area \A\ x \B\, etc. Interestingly, most of those constraints 
can be evaluated on a generated dyadic concept. Therefore, before computing 
the modus of such concept, we can check such properties and discard the concept 
if not respecting the constraints. Although not reflected in this paper, we tested 
how adding minimal (resp. maximal) size constraints on a bicluster affects both 
number of biclusters and execution times. The results are very interesting: for 
example with 9 = 33,000, \G\ — 500, and minimal (resp. maximal) size for \A\ 
set to 10 (resp. 40), TriMax produces only 5,332 maximal biclusters in 2.1 
seconds compared to 104,226 maximal biclusters extracted in 16.130 seconds 
without any constraint. 

Finally, the most interesting aspect of TriMax is its direct distributed com- 
putation capacity. Indeed, each iteration, i.e. for each block of tolerance, can 
be achieved independently from the others. Furthermore, the core of TriMax 
consisting in extracting dyadic contexts can also be distributed, see e.g. |10| . 
A deeper investigation remains to be done in this case. Note that although the 
method description involves W as a set of natural numbers, TriMax can directly 
handle numerical data real numbers, and has been implemented as such. 

Comparison with existing methods. Two existing methods in the literature 
also consider the problem of extracting all maximal biclusters of similar values 
from a numerical dataset. The first method is called Numerical Biset Miner 
(NBS-Miner 1 1). The second method is based on interval pattern structures 
(IPS |7I8|). Limited by space, we do not detail these methods. Both NBS-Miner 
and IPS algorithms have been implemented in C++. First experiments show 
that NBS-Miner is not scalable compared to IPS and TriMax. On another 
hand, it seems that TriMax outperforms IPS, but a deeper investigation is 
required. The main problem in IPS is to find an efficient algorithm able to 
compute tolerance blocks over a set of intervals. 

7 Conclusion 

We addressed the problem of biclustering numerical data with Formal Concept 
Analysis. So called (maximal) biclusters of similar values can be characterized 
and extracted with Triadic Concept Analysis, which turns out to be a novel 
mathematical framework for this task. We properly defined a scaling procedure 
turning original numerical data into triadic contexts from which biclusters can 
be extracted as triadic concepts with existing algorithms. This approach allows a 
correct, complete and non-redundant extraction of all maximal biclusters, for any 
similarity parameter 9 and can be extended to n-ary numerical datasets while 
their computation can be directly distributed. The interpretation of triadic con- 
cepts is very rich: both extent and intent allow to characterize a bicluster (i.e. the 
rectangle), while the modus gives the range of values of the biclusters, and for 
which 9 is the bicluster maximal. Moreover, the larger the modus, the more simi- 
lar the values within current bicluster. It follows a perspective of research, aiming 
at extracting the top-A; frequent tri-concepts with Data-Peeler [2], which can 
help to handle the problem of top-A; biclusters extraction. We also adapted the 



TCA machinery with algorithm TriMax to extract maximal biclusters for a 
user-defined 9, which is classical in the existing literature. It appears that Tri- 
Max is a fully customizable algorithm: any concept extraction algorithm can be 
used inside its core (along with several constraints on produced dyadic concepts), 
while its distributed computation is direct. Among several other experiments, 
it remains now to determine which are the best core algorithms for a given 9 
parameter, the very last directly influencing derived contexts density. 

Acknowledgements. Authors would like to thank Dmitry Andreevich Morozov 
for implementing the algorithms NBS-Miner and IPS. Mehdi Kaytoue was par- 
tially supported by CNPq, Fapemig and the Brazilian National Institute for Sci- 
ence and Technology for the Web (In Web). Sergei O. Kuznetsov was supported 
by the project of the Russian Foundation for Basic Research, grant no. 08-07- 
92497-NTsNIL_a. Juraj Macko acknowledges support by Grant No. 202/10/0262 
of the Czech Science Foundation. 

A Proof of the Proposition 1. 

Before proving this proposition, we need to introduce the following. For sake of 
simplicity, we now consider W as the set of all natural numbers from a numerical 
dataset that are greater or equal than the minimal value and lower or equal than 
the maximal value, i.e. W = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} with the example of Tablc[l] 

Definition 11 (Scale value and scale relation). We call scale value s = 
q — r where r = min(W) and q = max(W). The scale relation is a binary 
relation J C W x T, where T = {ii, . . . , t2 S +i} r < w < q and (w, ti) £ J iff 
i £ [w — r + l,w — r + 1 + s] . 

Note that J is equivalent to interordinal scale of W previously given, but 
this notations are used for the proof. 

Definition 12 (Eg w - cluster base). We introduce Eq w C T defined as Eg w = 
[tw+6-r+l]t w -r+l+s] for given 9 and w £ W . 

Example 9 (E 6w - cluster base). E 12 = ^2+1-0+1^2-0+1+9] = [*4;*12]- 
Proposition 4. (w b = m(g)) ~e (n(h) = w c ) iff ((g,m) £ Y¥ and (h,n) £ 

yl2 \ 

Proof. Let Eb,E c C T and w c > Wb- According to the definition (g,m) £ 
iff m,g,t are related by Y for all t £ Eeb- Using scaling and definition we have 
[t Wb -r+i\ ^wb-r+i+s] = E b E eb = [t Wb+ g- r+ i 1 t Wb - r+ i +s ] which is straight- 
forward. We just need to show that (h, n) £ Y^? holds as well. With scaling 
definition and previous definition we get [t tl)< ._ r .+i;f U)c _ r +i+ a ] = E c D Egf, = 
[t Wb +8- r +i,tw b -r+i+s] holding iff w c — Wb < 9, which is equal to the definition 
of ~ e . 



Moreover we can easily see as a corollary that w c — Wb < 9 holds iff Eb<~)E c D Egb 
and w c — Wb = 9 holds iff E b r\ E c = E eb . Now we can prove the Proposition 1 
from the main text. 

Proposition 1. Tuple (A 1 ,A 2 ,U), where A x C G, A 2 C M and U C T is 

triadic concept iff (Ai,A 2 ) is a maximal bicluster of similar values for some 
9 > 0. Furthermore the value of 9 is defined as 9 = s — \ U\ + 1. 

Proof. Let U — Egb and consider dyadic context Y^j 2 = YP for some Wb- Using 
dyadic closure operator <P (\P((Ai)) we get (Ai, A 2 ). From definition of triconcept 
we know that A\ C B\ implies A\ = B\ (the same for A 2 ). From definition of 
maximal bicluster of similar values we know that {A\,A 2 ) is maximal when it 
does not exists (Bi,B 2 ) s.t. B\ D A\ (the same applies for A 2 ). It is obvious 
that both sets are maximal from definition and when we have the same dyadic 
context Y^ 2 — Yg? . Now we need to look at dyadic context Yj} 2 = Y^? . In 
\U\ = \E eb \ = \[t Wb+ e-r+i;t Wb - r +i+s]\ we can easily see that \U\ = s - 9 + 1, 
which gives 9 = s — \U\ + 1. 

Finally, U is maximal (as being modus of a triconcept) and E$b is maximal 
as well because w c — Wb < 9 holds iff EbP\ E C Z) Egb- All facts mentioned in this 
proof leads to equality of the triconcept and maximal bicluster of similar values. 
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